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We study the asymptotic behavior of posterior distributions. We present general 
posterior convergence rate theorems, which extend several results on posterior con- 
vergence rates provided by Ghosal and Van der Vaart (2000), Shen and Wasserman 
[__( ' (2001) and Walker, Lijor and Prunster (2007). Our main tools are the Hausdorff 

C/^ ■ ct-entropy introduced by Xing and Ranneby (2008) and a new notion of prior concen- 

^ . tration, which is a slight improvement of the usual prior concentration provided by 

c^ ! Ghosal and Van der Vaart (2000). We apply our results to several statistical models. 



1. Introduction. Recently, a major theoretical advance has occurred in the theory 
-1^ ■ of Bayesian consistency for infinite-dimensional models. Schwartz (1965) first proved that, 

cn . if the true density function is in the KuUback-Leibler support of the prior distribution, 

(N 

Q ■ KuUback-Leibler neighborhood in Schwartz's theorem is not a necessary condition. When 



then the sequence of posterior distributions accumulates in all weak neighborhoods of the 
true density function. It is known that the condition of positivity of prior mass on each 



OO , one considers problems of density estimation, it is natural to ask for the strong consistency 

of Bayesian procedures. Sufficient conditions for the strong Hellinger consistency and for 
evaluating consistency rates have been currently developed by many authors. In this paper 

/\ ' we study the problem of determining whether the posterior distributions accumulate in 

c^ ■ Hellinger neighborhoods of the true density function. The rate of this convergence can 

be measured by the size of the smallest shrinking Hellinger balls around the true density 
function on which posterior masses tend to zero as the sample size increases to infinity. By 
the fundamental works of Ghosal, Ghosh and Van der Vaart (2000) and Shen and Wasser- 
man (2001), we know that the convergence rate of posterior distributions is completely 
determined by two quantities: the rate of the metric entropy and the prior concentration 
rate. Roughly speaking, the rate of the metric entropy describes how large the model is, 
and the prior concentration rate depends on prior masses near the true distribution. Since 
the true distribution is unknown, the later assumption actually requires that the prior 
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distribution spreads its mass "uniformly" over the whole density space. Another elegant 
approach for determination of the convergence rate was provided by Walker (2004), who 
obtained a sufficient condition for strong consistency by using summability of square root 
of prior probability instead of the metric entropy method. In this paper, in dealing with 
the rate of metric entropies we shall apply the Hausdorff oj-entropy introduced by Xing 
and Ranneby (2008), which is much smaller than widely used metric entropies and the 
bracketing entropy. For some important prior distributions of statistical models the Haus- 
dorff a-entropies of all sieves are uniformly bounded, whereas it is generally impossible 
to get uniform boundedness of metric entropies of large sieves. The application of the 
Hausdorff ct-entropy leads refinements of several theorems on posterior convergence rates, 
for instance, the well known assumptions on metric entropies and summability of square 
root of prior probability have been weakened, which particularly yields that Theorem 5 of 
Ghosal et al. (2007b) is strengthened into Corollary 3 of this paper. To handle the prior 
concentration rate, we shall apply a new notion of prior concentration. Our approach 
is a slight improvement of the prior concentration provided by Ghosal et al.(2000), and 
moreover the proof of Lemma 1 on which the approach bases is quite simple. Finally, to 
get posterior convergence at the optimal rate 1/ -Jn, we give an extension of Ghosal et 
al.(2000. Theorem 2.4), in which the universal testing constant has been replaced by any 
fixed constant. 

An outline of this paper is as follows. In Section 2 we define the Hausdorff a-entropy 
with respect to a given prior and then present general theorems for the determination 
of posterior convergence rates. We also give a new approach to compute concentration 
rates. In Section 3 we apply our results to Bernstein polynomial priors, priors based on 
uniform distribution, log spline models and finite-dimensional models, which leads some 
improvements on known results for these models. The proofs of the main results are 
contained in Section 4. 



2. Notations and Theorems. We consider a family of probability measures dom- 
inated by a a-finite measure /U in X, a Polish space endowed with a a-algebra X. Let 
Xi, X2, . . . , Xn stand for an independent identically distributed (i.i.d.) sample of n ran- 
dom variables, taking values in X and having a common probability density function /o 
with respect to the measure /U. Denote by Fq° the infinite product distribution of the 
probability distribution Fq associated with /q. For two probability densities / and g 

we denote the Hellinger distance H{f,g) = (/x(\//(^) ~ ^/di^) ) IJ'{dx)] and the 

KuUback-Leibler divergence K{f, g) — J^ f{x) log ^^^ ii{dx). Assume that the space F of 
probability density functions is separable with respect to the Hellinger metric and that JF 
is the Borel a-algebra of F. Given a prior distribution H on F, the posterior distribution 
H^ is a random probability measure with the following expression 

n 

Ln/ran(rf/) ^..)n(rfn 



IwUfix^)uidf) hRnif)nidf) 



i=l 



for all measurable subsets A C F, where Rn{f) = Yl {fi^i)/fo{Xi) } is the likelihood ratio. 

i=l 

In other words, the posterior distribution H^ is the conditional distribution of 11 given the 
observations Xi, X2, . . . , X^. If the posterior distribution H^ concentrates on arbitrarily 
small neighborhoods of the true density function /o almost surely or in probability, then 
it is said to be consistent at /o almost surely and in probability respectively. Throughout 
this paper, almost sure convergence and convergence in probability should be understood 
as to be with respect to the infinite product distribution F^ of Fq. 

Our aim of this article is to present general theorems on posterior convergence rates 
at /q. By the posterior convergence rate theorems of Ghosal, Ghosh and Van der Vaart 
(2000), we know that the prior concentration rate and the rate of metric entropy both 
completely determine the convergence rate of posterior distributions. More specifically, a 
key inequality to determine almost sure convergence rates of posterior distributions is that 
for each £ > 0, 

J Rn{f)U{df) > e-3--' n(/ : if(/o,/)'||/o//||oo < e^) 

almost surely for all sufficiently large n, where ||(7||cx) stands for the supremum norm of the 
function g on X. This inequality was obtained by Ghosal et al.(2000. Lemma 8.4) under 
mild assumptions. It appears almost in all of papers handling strong convergence rates of 
posterior distributions. The reason is that in order to get the convergence rate of posterior 
distributions one needs to find a suitable lower bound for the denominator in the expression 
of posterior distributions. This is successfully done in Ghosal et al.(2000), who suggested 
that the prior 11 puts sufficiently amount of mass around the true density function /o in 
the sense: n(/ : if (/o, /)^ II/0//II00 < £ n ) — e""''^"'^ for some fixed constant c. Such a 
sequence {Sn} is referred to as the concentration rate of the prior 11 around /q. Here we 
give a slightly stronger result. We introduce a modification of the Hellinger distance 



It is clear that the inequality | |/o//| |cx) > 1 holds for all density functions / and /o such that 
the supremum is well-defined, and the quality holds if and only if / = /o almost surely. 
Observe also that H,{foJ) ^ if*(/,/o) and 3-^/^ H{foJ) < H,{foJ). Moreover, we 
have 

H4foJ)<H{foJ)\\^^/]^+^\\li^<H{foJ)\\fo/f\\li'<H{foJ)\^^^^ 

which yields 



{/ G F : if.(/o, /) < En} D{fe¥: if(/o, ff ^||/o//| L < ^t } 

D{fe¥:H{fojf\\fo/f\\^<el}. 
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The following simple lemma shows that, for e^ to be a prior concentration rate, it is enough 
to assume U{Ws„) > e""^"' ^^^ where Wg = {/ G F : H^{fo, f) <e}. 

Lemma 1. Let e > and c > 0. Then the inequality 

holds for all n. 

Lemma 1 provides a useful approach to compute prior concentration rates, particularly 
for models in which rate of convergence is governed by the prior concentration rate. It 
leads a simplification of the proof of Theorem 2.2 of Ghosal et al.(2000). Furthermore, 
we shall present general posterior convergence rate theorems in which the well known 
assumptions on metric entropies and summability of square root of prior probability are 
also weakened. We shall apply the Hausdorff ct-entropy J(5, Q, a) introduced by Xing et 
al.(2008). Denote by L^ the space of all nonnegative integrable functions with the norm 
ll/lli = Ixf(^)l^(^^)- Write logO = -cx). 

Definition. Let a > and ^ C F. For d > 0, the Hausdorff a-entropy J(5, Q, a) with 
respect to the prior distribution 11 is defined as 

N 

J{6,g, a) = log mf 5^n(S,)«, 
i=i 

where the infimum is taken over all coverings {i?i, i?2, ■ • • , -Bat} of Q , where N may take 
the value cxd, such that each Bj is contained in some Hellinger hall {/ : H{fj, f) < 6} of 
radius d and center at fj G L^. 

Note that the infimum can be equivalently taken over all partitions {Pi, P2, • • • , Pn} of 
Q such that the Hellinger radius of each subset Pj does not exceed d. It was proved in Xing 
et al.(2008. Lemma 1) that the Hausdorff a-entropy J(5, Q, a) is an increasing subadditive 
function of Q and satisfies J(5, Q, a) < log N{d, Q) for all a > 0, where N{5, Q) stands for 
the minimal number of Hellinger balls of radius 5 needed to cover Q. For < a < 1 and 
each ^ C F, we also obtained the following useful inequality 

Our first result in this paper is the following general theorem on posterior strong conver- 
gence rates. Denote Ag = {/ : H{fo, f) > e}. 

Theorem 1. Let {£n}^i o.nd {£n}^i be positive sequences such that n min(£^, e^) — * 00 

as n — ^ 00. Suppose that there exist constants ci > 0, C2 > 0, C3 > 0, < a < 1 and a 

°° _ -2 
sequence {Qn}'^=i of subsets on F such that "^ e ^^"'^^ < 00 and 

n=l 

00 _ _ 

(1) V e'^*^'^"'^"'")~"'^"^i < 00, 

n=l 



(2) E e"^" (3+3C2+C3) n(A,„ \ G^) < oo, 

n=l 



(3) n(/ :if,(/o, /)<£„) >e-^tc3 



T/ien /or £^ = max(£^, e^) anc? eac/i r > 2 + y -^ — °^!.q"'^^ "^ > ^c /iave n^(Arg„) ^ 
almost surely as n ^ oo. 

As direct applications we have 

Corollary 1. Let c\ > 0, C2 > 0, C3 > and < a < 1. Suppose that {£n}^i is 
a positive sequence satisfying ^ e "'^"'^^ < 00 and suppose that there exists a sequence 
{Gn}'^=i of subsets on ¥ such that 

(1) J{en,Qn,a) < nelci, 

(2) n(F \ ^„) < e"" ^" (3+3C2+C3) ^ 

(3) n(/:if,(/o,/)<£. ) >e-^"^3. 



Then for each r > 2 + y -^ — '^^\^a^ '^^ , we have that Iln{Arer,) -^ almost surely 
as n ^> 00. 

Proof. It is clear that all conditions of Theorem 1 are fulfilled if we let Sn = Sn = £n and 
replace the ci in Theorem 1 by ci + C2 of Corollary 1, and the proof is complete. 

Corollary 1 extends Theorem 2.2 of Ghosal et al.(2000), in which they have stronger 
conditions than (1) and (3) of Corollary 1. It is probably worth mentioning that for sev- 
eral important prior distributions of infinite-dimensional statistical models, the quantities 
J{.^m Gm «) are uniformly bounded for all n and hence condition (1) of Corollary 1 is triv- 
ially fulfilled, whereas general metric entropies of the sieve Qn grow to infinity as the sample 
size increases. A slightly different version of Theorem 1 is the following consequence, which 
is in fact an extension of Proposition 1 of Walker et all. (2007). 

Corollary 2. Let {£n}^i be a positive sequence such that ne^ ^ 00 as n ^ 00. Suppose 

that there exist constants ci > 0, C2 > 0, C3 > 0, < a < 1 and a sequence {Uj^iQnj}'^=i 

°° _ 2 
with Qnj C F such that ^ e '^^^'^'^ < 00 and 



n=l 



(1) E E A^(£n,^n,)'-"n(^^,)«e-^"^^ < 00; 

n=l j = l 



(2) E e"^" (3+3C2+C3) n(A,„ \ U- i^„,) < oo, 



n=l 



(3) n(/:if,(/o,/)<£, )>e-4-3. 



r/ien for each r > 2 + J ' ""*" "^'^^+"'^3+^1) ^ ^^ /iave t/iat n„(Arg„) ^ almost surely as 

n -^ 00. 

Proof. Let ^^ = ^JLiGnj- We only need to verify condition (1) of Theorem 1 for such a 
sieve Qn- By Lemma 1 of Xing et al.(2008) we have 

00 00 00 

\ ^ ^J{Sn,Gn,ce)—ne'^Ci ^ \ ^ \ ^ J(e„,C/„j,Qf)-ne^ ci 

CXD 00 

Corollary 2 then follows from Theorem 1. 

The assertion of Theorem 1 is an almost sure statement that the posterior distri- 
butions outside a Hellinger ball with a multiple of En as radius converge to zero al- 
most surely. Now we give an in-probability assertion under weaker conditions. Denote 

V{f,g) = J^f{x){log^)%{dx). 

Theorem 2. Let {£n}^i and {en}'^=i be positive sequences such that n min(£^,£^) — > 00 
as n ^ 00. Suppose that there exist constants ci > 0, C2 > 0, < ct < 1 and a sequence 
{^n}5^i of subsets on F such that 

(1) J{en,QmCt) — ne'^ci — ;► — 00 as n — *> cxd, 

(2) e"^~"(2+c2)n(A,„\^„)^0 as n ^ 00, 

(3) n(/ : KifoJ) < el and V{foJ) < el) > e^^'-^^ 



Then for en = max(£„,£ji) and each r > 2 + J ^ " ^^^ ^'^' , we have that liniyAre^) -^ 
in probability as n — > 00. 

Observe that the conditions (2) of Theorem 1 and Theorem 2 are only used to ensure 
that n^(Ag^ \ Qn) — > as n -^ 00. So one can replace the conditions (2) of these theorems 
by Iln{Ag^ \ Qn) ^ as n ^ 00 almost surely and in probability respectively. Theorem 
2 is an extended version of Theorem 2.1 of Ghosal et al.(2001) and Theorem 1 of Walker 
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et all. (2007). Furthermore, as a consequence of Theorem 2 we obtain the following slight 
improvement of Theorem 5 of Ghosal et al. (2007b). 

Corollary 3. Let {en}'^=i be a positive sequence such that ne^ ^ oo as n ^ oo. Suppose 
that there exist constants ci > 0, C2 > 0, < a < 1 and a sequence {U°%iQnj}'^=i with 

Gnj C F. // 

2 °° 

(1) e-"^"^i J2 N{en.GnjV-'''n{Qnjr ^0 as n^oo; 



(2) e'"^"(2+c2)n(A^^\u°^i^^j) — >0 as n ^ oo; 

(3) n(/ : KifoJ) < el and V{foJ) < el) > e--"-^ 



then for each r > 2 + y " j^"^^ '^ , we have that 11^ (A^ g^ ) ^ m probability as n ^^ oo. 
Proof. For ^„ = y^fLiQnj, by Lemma 1 of Xing et al.(2008) we get 



oo oo 

.2 



jJ(e„,6„,a)-ne„ci ^ ^ ^ J(g„,e„^. ,a)-n e„ ci ^g-ne„ci ^ Ar(£^, ^^^.) l"" n(^^ 



which tends to zero as n ^ oo and condition (1) of Theorem 2 holds. Then by Theorem 
2 we conclude the proof. 

The above theorems cannot yield a convergence rate I/a/ti because of the assump- 
tion nel ^ oo. Particularly, these theorems cannot well serve finite-dimensional models. 
Ghosal et al.(2000, 2007a) have obtained a nice theorem to handle such models. Denote 
Sg2 = {/ : K(/o, /) < el and y(/o, f)<el]. Now our result is 

Theorem 3. Let {£n}^i be a positive sequence such that nel are uniformly bounded 
away from zero, i.e., there exists a constant cq > such that nel> cq for all n . Suppose 
that there exist constants < a < 1, ci < ^^^ and a sequence {Gn}'^=i of subsets on F 
such that 



(1) '"""nfsT^'-' ^O - "-~. 



(2) exp(j(ifi,{/£a„: ie„<if(/„,/)<2iE„},a)) < e'"'"''- n{B,.J 

for all sufficiently large positive integers j and n. 
Then for each rn ^ oo we have that n„(Ar„g„) ^ in probability as n ^ oo. 



Remark. Here we adopt the convention that if the denominator of a quotient equals zero 
then the numerator must also be zero. Hence, Theorem 3 is still true even when Il{Bg2 ) = 
for some n. 

Corollary 4. Let {£n}^i be a positive sequence such that ne^ are uniformly hounded 
away from zero. Suppose that there exist constants ci, ci and a sequence {Qn}'^=i of subsets 
on F such that 



(1) logiV(i|^, {feGn-. jsn < HifoJ) < 2jen}) < cms 
for all integer j and n large enough, 



(2) ^^^^Ji^^^O as n-oo, 



(3) — ^^ — — ^T^ — ^ — < e^2j "£„ for all integer j and n large enough. 



Then for each r„ ^ oo we have that Iln[Ar„g^) ^ in probability 



as n ^^ oo. 



Corollary 4 is a slightly stronger version of Theorem 2.4 of Ghosal et al.(2000). A 
notable improvement in Corollary 4 is that we have no restriction on the constant C2, 
whereas their constant C2 equals half of some universal testing constant. 

Proof of Corollary 4- We only need to check condition (2) of Theorem 3. It follows from 
Lemma 1 of Xing et al.(2008) and conditions (1) and (3) that 

^(^, {feQn: JSn < H{fo, f) < 2j£^}, a) 

< alogH(/ G Gn : jSn < HifoJ) < 2j£,) 

+(1 - a) logiV(^, {feQn: JSn < H{fo, f) < 2je^}) 

< alog(e^^^""^"H(S,2)) + (1 - a)c,nel 

= («C2 + ^^—:^)fnel + alogU{B,2j. 

Taking a small a in (0, 1) and then letting j be large enough, we have that ac2 + ~" < 



l-g 
18 



J^ 



and hence condition (2) of Theorem 3 is fulfilled. The proof of Corollary 4 is complete. 



We will conclude this section by presenting an analogue of Theorem 3, which gives an 
almost sure assertion under stronger conditions. 



Theorem 4. Let {£n}^i be a positive sequence such that there exists a constant cq > 
such that ne\> Cq logn for all large n . Suppose that there exist constants < a < 1, 
ci < ^^^, C2> -^ and a sequence {Qn]'^=i of subsets on F such that 

n=l 



(2) exp(j(^, {feg^: jsn < H{fo, f) < 2j£4, a)) < e--^"^^l n(Ti^,J" 

for all sufficiently large positive integers j and n. 
Then for each r large enough we have that Iln{Are„) —* almost surely as n — > oo. 

Completely following the proof of Corollary 4, we have the following consequence of 
Theorem 4. 

Corollary 5. Let {£n}5^i be a positive sequence such that there exists a constant cq > 
such that ne\ > cq logn for all large n . Suppose that there exist constants ci, c^ > -^, 
cs and a sequence {^n}^i of subsets on ¥ such that 



(1) logiV(i|^, {feg^: je^ < HifoJ) < 2je^}) < c^ne 
for all integer j and n large enough, 



n=l 



(3) — ^^ — — Yuw — V — ^^^"' "'^" ^'-'^ ^^^ integer j and n large enough. 

Then for each r large enough we have that n^(Are„) -^ almost surely as n ^ oo. 

3. Illustrations. In this section we apply our theorems to Bernstein polynomial pri- 
ors, priors based on uniform distribution, log spline models and finite-dimensional models. 
This leads some improvements on known results for these models. 

3.1. Bernstein polynomial prior. A Bernstein polynomial prior is a probability measure 
on the space of continuous probability distribution functions on [0,1]. Petrone (1999) 
introduced the Bernstein polynomial prior 11 by putting a prior distribution on the class 
of Bernstein densities in [0, 1] in the following way: 

k 

bix;k,F) = J2{F{j/k) - F{{j -l)/k)) P{x;j,k- J + 1), 
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where /3{x;a,b) stands for the beta density I3{x]a,h) = r(a)r(fe) ^" ^^^ ~ ^^^ ^' ^ ^^^ ^ 
distribution p(-), F is a random distribution independent of p. In other words, if Bj 
stands for the set of aU Bernstein densities of order j, then n(-) = J27Li pU)^B-j{-), 
where the probabihty measure Ub is the normahzed restriction of 11 on Bj. We refer 
to Petrone and Wasserman (2002) for a detailed description of the Bernstein polynomial 
prior, in which consistency of the posterior distribution for the Bernstein polynomial prior 
is discussed. Rates of convergence have been established under suitable tail conditions 
on p by Ghosal (2001) and Walker et al.(2007), where the convergence is understood 
as convergence in Fg^-probability. Ghosal (2001, Theorem 2.3) proved that the prior 
concentration rate is (logn)"^'^/n^/"^ and the entropy rate is (logn)^'^/n^/'^ under the tail 
assumption p{j) ^ e~'^^ for all j, which yields the convergence rate (logn)^/^/n^/"^. Walker 
et al.(2007) obtained the entropy rate {logn)^'^ /n^'^ under the lighter tail condition p{j) < 
g-4jiogj _ (^x/jj^4 £qj^ ^Yl j^ which leads the convergence rate (log?i)^/"^/n^/"^. In the 
following we establish the entropy rate 1/n'^ under the tail condition p{j) < {1/ j^Y^ for 
all J, where cq is any fixed positive constant and 7 is any fixed constant strictly less than 
1/2. Hence we also get the convergence rate {lognY'^ /n^'^ under the weaker condition 
p(j) < {l/j^Y''. This improves the result of Walker et al.(2007). 

From Ghosal (2001) it follows that there exists an absolute constant c > such that 
N^Sn.Bj) < {c/SnY for all j. Given 7 < 1/2, choose Q^j = Bj and En = l/ri^. Take 
< a < 1 such that 7 (2 + 1/d) < 1 with d = CQa/{l — a). To verify condition (1) of 
Corollary 3, by p(j) < (l/j-')'^° we obtain that 



00 


n,BjY~ 


00 






^ ^ f tI — a/ 




, e--^-^^ 


y. 


/cn'^\i(i-«) v^ 


/ly(i-«) 
[2) 



l<j<(2cn^)Vd •' j>(2cn^)l/d 

where the last sum on the right hand side tends to zero as n ^ 00. To estimate the first 
term, note that for g{t) = {crC /f^Y we have that g'{t) = {crP /f^Y {}-Og en* /f^ — d) = is 

equivalent to t = c^''^v^''^e~^. This implies g{j) < e'^^'^ for all j, where di stands for 
the constant dc^''^e~^. Therefore, by the inequality x < e^ for x > we get 



g-cm 



'-'' Y: {^y^'~"^ < e--""'^ (2cn-)V^e'^^(i-«)"^^^ 

l<J<(2cn^)i/d 



since 1 — 27 > 'j/d, and hence condition (1) of Corollary 3 holds. Condition (2) of Corollary 
3 is trivially fulfilled and hence by Corollary 3 we obtain that the entropy rate is at least 
1/n'^ for any given 7 < 1/2. 

10 



3.2. Prior based on uniform distribution. Ghosal et al.(1997) established posterior con- 
sistency for prior distributions based on uniform distributions of finite subsets. Priors 
based on discrete uniform distributions were further studied in Ghosal et al.(2000), in 
which they used the bracketing entropy as a tool to compute the convergence rate of 
posterior distributions. As an application of Theorem 1, we now give a slight extension. 
Given Sn > 0, assume that there exist density functions /i,...,/Ar^^ such that all sets 
{/ : H^{f, fj) < 3~^/^£n} form a covering of F. Denote by Hn the uniform discrete prob- 
ability measure on the set {/i, . . . , /n^}. Define a prior distribution 11 = J27Li <^j l^j ^^r a 
given sequence aj with aj > and J27Li '^j = ^■ 

Theorem 5. // logA^„ + logn + log ^ = 0{ne^) as n ^> oo, then the posterior dis- 
tributions Hn converge almost surely at least at the rate En, that is, Iln{f '■ H{fo, f) > 
^£n) — ^ as n ^' oo almost surely for any given sufficiently large r. 

Proof. From logn = 0(n£^) it follows that Yl e~"^®"^ < cxd for all large c > 0. For any 

/ with H^f, fj) < 3-i/2£„ we have if (/, fj) < S^/^H^f, fj) < e^. Hence we obtain that 
{/ : H^ifjj) < 3-V2£^} c {/ : if (/,/,) < £n} and then J(£n,F,0) < logiV, = 0{nel), 
which implies condition (1) of Theorem 1. Condition (2) is trivially fulfilled. Condition 
(3) follows from the fact that n(/ : if*(/o,/) < £n) > n(/ : if*(/o,/) < ^-^'"^e^) > 
Cn/Nn > e~"'^"'^3 for some cs > 0, since {/ : if*(/o, /) < 3~^/^£n} contains at least some 
density function of the set {/i, . . . , fN„}- The proof of Theorem 5 is complete. 

It seems to be unusual to find a covering of the density space with covering sets of type 
{/ : if*(/, fj) < CEn}. The most widely used norm for continuous functions should be the 
supremum norm. In fact, one can easily construct a new covering U^^^{/ : H^{f^fj) < 

CEn} of F in terms of a given covering U -^ij/ : ||v7 — ^/gj\\oo < £n} with nonnegative 
bounded functions gj (not necessarily density functions), as shown in the following: Take 
fj{x) = {y^gjjx) + Sn)'^ / /x(a/(7j(x) +£n)^ /u((ix), where we assume that /U is a probability 
measure on X and £n < 1 for all n. Then for each / with ||v7 — y^||oo < £n, that is, 

\/9j{x) - £n < a/TM < Vdji^) + £n on X, we have 



2 I A ^ I /-f*/™^ ../ J™^ ^ /'I r o , ^2 
j 



l + Aei + Asr, \lff{x)^{dx)<{l + 2e 



n) 1 



where f* is some density function in {/ : ||v7 — y^||oo < £n} and the last inequality 
follows from ||v7*||i < ||v7*l|2 = 1- This implies that 



HM,f,)<(^-{l + 2e^) + -y H{f,f^)<2H{f,f 

11 



<4£„ + 2n /" {^g,{x)+en)^ ^i{dx)y - 1 j < 4£^ + 2 ((1 + 2£,) - l) =85^. 
Therefore, we have 

|J{/: H,{fjj)<9>e^]^ \J{f: ||/ - (7,||oo < £n} D F. 
i=i i=i 

Observe that the numbers of covering subsets in both type coverings are equal. 

For models with H^{f,g) controlled by a constant multiple of the Hellinger metric 
H{f,g) such as the exponential family and a model with uniformly bounded supremum 
norm ||//(7||oo for all density functions / and g, it is not necessary to assume that the 
probability measures Hn constructed above concentrate on a finite number of points. Here 
we give an extension of Theorem 5. Let cq > 1 and Fcp be a subfamily of F such that 
H^{f, g) < Co H{f, g) for all f,ge F^q. Given e^ > 0, let {Pi, ... , Pk„} be a partition of 
FcQ such that for each Pi there exists fi in Fcq with Pi C {f : H{fi, f) < £^/2co}. Take 
any probability measure fin on Fcq with finiPi) = l/-^n for i = 1, 2, ... , Kn- Define then 
a prior distribution fl = Xl^i % fij ^^^ ^ given sequence aj with aj > and Xl^i '^j — 1- 
Now we have 

Theorem 6. Let /o G Fc^. // logK^ + logn + log ^ = 0{nel^) as n ^>- oo, then the 
posterior distributions Tin converge at least at the rate e^ almost surely. 

Proof. By the proof of Theorem 5, we only need to verify condition (3) of Theorem 1. Take 
/i„ G Fco such that if (/o, /^J < £n/2co. Then, for all / G F^,, with H{fi^^J) < £„/2co, 
we have that if*(/o,/) < coH{fo,f) < cq ii"(/o, AJ +coH{fi^^,f) < En- Hence we get 
that H(/ : ii"*(/o, /) < £n) > n(P,J > a^/Kn > e-"^"'^^ for some C3 > 0, and the proof 
of Theorem 6 is complete. 

Observe that, given a covering {Oi, O2, . . . , Ok„} of Fcg, one can easily construct a 
partition {Pi, P2, . . . , Pk„} of Fc^ in the following way: Pi = Oi fl Fc^ and P^ = {Oi — 
UJ-|Pz)nF,„ forz = 2,3"...,K„. 

Example ( Exponential families ) . We consider the exponential family of all density func- 
tions of the form e^^^\ where the function h{x) belongs to a fixed bounded subset in the 
Sobolev space C^[0, 1] with p > 0. A subclass of this family has been recently studied by 
Scricciolo (2006). Following a result of Kolmogorov and Tihomirov (1959), we know that 
the £-entropy of this family with respect to the norm || ■ ||oo equals 0(£~^'^). Thus, using 
the above argument we get that logKn = 0{ne^ for Sn = n"^/'^^^"'"^^ and hence by Theo- 
rem 6 the posterior distributions constructed above converge at the rate Sn = n~P'^'^P^^\ 
which is known to be the optimal rate of convergence in the minimax sense under the 
Hellinger loss. 

3.3. Log spline models. Log spline models for density estimation have been studied, 
among others, by Stone (1990) and Ghosal et al.(2000). Let [{k - 1)/Kn,k/Kn) with 
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/c = 1, 2, . . . , Kn be a partition of the interval [0, 1). The space of sphnes of order q relative 
to this partition is the set of all functions / : [0, 1) ^ R such that f is q — 2 times 
continuously differentiable on [0, 1) and the restriction of / on each [(/c — 1)/Knj k/Kn) is 
a polynomial of degree strictly less then q. Let Jn = q + Kn — 1. This space of splines is a 
Jn-dimensional vector space with a B-spline basis -Bi(x), i?2(a^), • • • , Bj^{x), see Ghosal et 
al.(2000) for the details of such a basis. Let F be the set of all density functions in C"[0, 1]. 
Assume that the true density function fo{x) is bounded away from zero and infinity. We 
consider the J^-dimensional exponential subfamily of C"[0, 1] of the form 






'J n 

/e(a:)=exp( ^^,S,(a:) - c(^) 



•J n 



where 9 = (^i, ^2, • ■ ■ , ^jj €60 = {(^1, ^2, • • • , ^jj e M"" : E -Ii ^j' = 0} and the 



constant c{9) is chosen such that fe{x) is a density function in [0,1]. Each prior on 
©0 induces naturally a prior on F. Let ||^||oo = rnaxj \9j\ be the infinity norm on Gq. 
Assume that ai n^' '•^"''~^-* < K^ < a2n^'^'^'^^^' for two fixed positive constants ai and 
02. Assume that the prior H for Go is supported on [— M, M]"'" for some M > 1 and 
has a density function with respect to the Lebesgue measure on Go, which is bounded 
below by ag" and above by a^"" . Take a constant (i > such that (i 116*1100 < 1 1 ^ogfe{x)\\oo 
for all 9 e Go. Ghosal et al.(2000. Theorem 4.5) proved that, if /o G C"[0, 1] with 
q > a > 1/2 and || log/o(x)||oo < dM/2, the posteriors H^ converge in probability at the 
rate n~°^'^'^°^^^' . Using Corollary 5 we now get that under the same assumptions as in 
Ghosal et al.(2000. Theorem 4.5), the posteriors H^ are in fact convergent almost surely 
at the rate Sn = n~"-/^'^°'^^\ To see this, take Qn = F. Clearly, ne^ > logn for all large 
n. Condition (1) of Corollary 5 has been verified by Ghosal et al.(2000) and condition (2) 
is trivially fulfilled. Condition (3) follows also from the proof of Theorem 4.5 in Ghosal et 

al.(2000), since the inequality ii"*(/o, fe) < H{fo, fe) Wfo/feWH^ holds for all 9 G Gq. 

3.4- Finite- dimensional models. Let /3 > and let G be a bounded subset in M with the 
Euclidean norm 1 1 ■ 1 1. Denote by F the family of all density functions fg with the parameter 
^ in G satisfying 

ai\\9i - 92\f < Hifg^Je,) < V3H4fg,Je,) < a2\\9i - 92\f 

for all 6*1, 6*2 G G, where ai and a2 are two fixed positive constants. Assume that the true 
value ^0 is in G and that the density function of the prior distribution 11 with respect 
to the Lebesgue measure on G is uniformly bounded away from zero and infinity. Under 
slightly weaker conditions, Ghosal et al.(2000) proved that the posterior distributions n„ 
converge in probability at the rate \j ^Jn. Now we give an almost sure assertion for this 
model. 

Theorem 7. Under the above assumptions, the posterior distributions Un converge almost 
surely at least at the rate ^/Togn/^/n. 
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Proof. We shall apply Corollary 5 for Q^ = IF. Clearly, ne'^ = logn for e^ = ^ylog n/^/n. 
Condition (1) has been verified in the proof of Theorem 5.1 of Ghosal et al.(2000). Con- 
dition (2) is trivially fulfilled. Using H^{fQ^,fQ) < 3~^/^ 02 116*0 — 6*11^, we have that 
^{WeJ > n(^ : ll^-^oll < (y3£n/a2)^/^). Hence, the verification of condition (3) 
follows from the same lines as the proof of Ghosal et al.(2000. Theorem 5) and then by 
Corollary 5 we conclude the proof of Theorem 7. 



4. Lemmas and Proofs. In this section we give proofs of our lemmas and theorems. 
For simplicity of notations, we assume throughout this section that En — Sn — Sn- 

Proof of Lemma 1. It is no restriction to assume that n(VFg) > 0. Using Jensen's inequality 
for the convex function x~^''^ for x > and Chebyshev's inequality, we obtain that 






-2 (3 

On the other hand, we have 






^^^Tra^'^^ Tim 



1 + / ^^^^^^j=^^[^jm- ^m + ^m)^l^)\^{dx) 



= 1+ / ( vToM - V^TM ) Zr^ l^{dx) + / ( /o(a;) - Vfi^) fo{x) ) //(rfx) 
JX^ ^ \/f(x) JX 



1+ /(/M^-/7M)'^^^Mrf^) + | fjy^M^)-y7{x))%{dx) 
JX^ ^ \/f(x) ^ JX 
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l + ^^*(/o,/f <ei^*(^-^)^<ei 



where the last inequahty holds when / G Wg. Hence we get 

F^( /i?„(/)n(rf/)<e--^'(3+2c)n(vi/^ 

The proof of Lemma 1 is complete. 

In the proof of Theorem 1 we use the following Lemma, which is similar to Lemma 5 
of Barron et al.(1999). 

Lemma 2. Let C2 > and cs > 0. Let {£n}^i be a positive sequence such that n(VFg„) > 
e "■^"'^3 for all n and Yl ^ "'^"'^^ ^ qo_ /j q sequence {-Dnj^i o/ subsets in F satisfies 

n=l 

^ gne„ (3+3C2+C3) n(iD^) < OO, t/icn 11^(1)71) ^ almost surely as n ^ oo. 
Proof. From Chebyshev's inequality and Fubini's theorem it turns out that 

Fo°°{ / Rnif)U{df) >e-"^"(3+3c2+C3) I <gn4(3+3c2+C3) ^ /" R^^f^U{df) 
^gn4(3+3c2+C3) f ERn{f)U{df) = e''^'-^^+^^^+''''^U{Dn) 

for all n. Hence by the first Borel-Cantelli Lemma we get that 

/ i?^(/)H(rf/)<e-"^"(3+3c2+c3) 

almost surely for all n large enough. On the other hand, Lemma 1 and the first Borel- 
Cantelli Lemma yield that 

f Rn(f)U(df) > U(We )e-"^»(3+2c2) > ^-n^ (3+2C2+C3) 

JF 
almost surely for all n. Therefore, we obtain that with probability one, 

which tends to zero as n ^ oo and the proof of Lemma 2 is complete. 
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Proof of Theorem 1. It is clear that if condition (1) holds for some a = a^ then it also holds 



for any a > ao- So we may assume that < a < 1. Given r > 2 + y -^ — "j^^^""^^ '^^ , 
we have 

It then follows from Lemma 2 that H^ (^Ag^\Qri) -^ almost surely as n -^ oo. So it suffices 
to prove that 'iln{Gn^Are„) -^ almost surely as n ^ oo. By the definition of J{en, Gn, «), 
for each fixed n there exist functions /i, /2, • • • , /a^ in L^ such that Qn H Are„ C IJi=i ^j^ 

where S, = ^„ n A,e„ n {/ : if(/„ /) < En} and Ef=in(Sj)« < 2e^(^"'^"'-). It is no 
restriction to assume that all the sets Bj are disjoint and nonempty. Taking a /* G Bj we 
get that if (/„ /o) > if (/;, /o) - if (/;, /,) > (r - 1) 5^. Now for each S, we have 

where fkB, (x) = J^^ f{x) Ru{f) Il{df)/ J^^ Ru{f) Il{df) and i?o(/) = 1- The function f^B, 
was introduced by Walker (2004) and can be considered as the predictive density of / with 
a normalized posterior distribution, restricted on the set Bj. Clearly, Jensen's inequality 
yields that H{fkB^Jjf < 4 for each k. Hence H{fkBjJo) > H{fjJo) - H{fj, fkB,) > 

(r — 2) Sji > 0. Since Yl ^ "^s„c2 ^ ^^^ j^ turns out from Lemma 1 and the ffist Borel- 

n=l 

Cantelli Lemma that J^ Rn{f) n((i/) > e""^^" (3+2c2) n^VFg^) almost surely for all n large 
enough. Hence, by condition (3) we obtain that 

N 
Un{gnnAre„) < (H^(e^n H A,^ j) " < ( ^H,(S,) )" 

TV n — 1 f (Y "i" ^ n— 1 f f y \OL 

E n(s,)" n '";i::;i e n(B,)° n 'Xfx.::;' 

<^n„(Bir = i;^^— = ^ — < -"" 



.tt (/f i?.(/) H(rf/)) (h(W^, J e— ^(3+2c.) 



N n-1 f / -5^ ^c^ 



< e'^4(3+2c2+C3)ay^]-[('^.Na TT fkB,[Xk+l 



j = l fc=0 



/o(^fc+l)" 



almost surely for all n large enough. Since r > 2 + w ' ""*" "^'^^+"'^3+ci) ^ ^j^^ inequality 

(3 + 2c2 + cs) a < ^ {r — 2)^ (1 — a) — ci holds. Take a constant 6 with (3 + 2c2 + C3) a < 
b< i (r - 2)2 (1 - a) - ci. Denote J^^ = a{Xi, X2, . . . , Xfc}. Then we have 



N n—1 r ( \r 



7 = 1 A;=0 ■'^ "•" ^ 
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N 



n-1 



fkBi {^k+1 






N 



^e-^-'J2^{B,rEi n 






n-l 



k=0 



fkBj{Xk + lY 

uxu+ir 



n-l 



fkBj (-^fc+1 



max ^ ( TT , ,^ 



where 



n-l 



^ n 



A:=0 



fkSj {Xk + l 



fc+lj 



« M^ n 



n-l 



^n— 2 -. ( Y 



vfc = 



/o(^, 



fc + 1; 



A;=0 
fn-lsAX. 



fo{Xn 



UXk+lY 



^n-1 



By the conditional Holder's inequality we get that with probability one, 



E 



fn-lBj {Xr. 



< E 



fn-lBAXn)^'^ 



^n-1 = E 



fn-lBj (Xn) 2 fn-lBj {Xr, 



foiXr, 



a 2 
2 ' 2 — a 



^E 



^n-l 



fn-lBiiXr. 



foiXr,)^ foiXr, 

fn-lBAX. 



^n-1 



2-a 
2 



E 



a_ 2_ 
2 ' a 



fo{X„ 



a 2_ 

2 ■ a 



^n-1 



MX, 



T 



n-l 



2-a 
2 



Take the integer m with j^ < 2"^ < j^^. Repeating the above procedure m — 1 more 
times we obtain that with probability one, 



E 



fn-lBj{Xr, 

h{x^Y 



^n-1 < E 



fn-lBjiX, 



2"^(l-a) + c 



fo{X, 



2"^(l-a) + c 



-^n-1 



2"'(l-a)+c 



which by the conditional Holder's inequality is less than 

fn-lBjiXn) 



E 



MX, 



^n-1 



H{fn-lBj,M " ~^ 



fn-lBAXn) fo{Xn) ^{dX^] 



in — l _/, \V Zi] cXorri — 1 



< 1 



'n \ 2' 



2 / - V 2 

< g-2— (r-2)2a£2 ^ ^l (^_2)2 (a-l)^^ 
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Hence, with probability one, we have 



JkBAy^k+l) \ . 1 fr-2)2fa-l-)e2 r^\ TT ikBA^k^l, 



g n 7;r^" <e*'-"'<°-"''E n 



a 



Repeating the same argument n — 1 times, we obtain that for each j. 



n-l 



/fcB^-(Xfc+i)" ^^ ^ _l (r-2)2(a-l)n4 






Therefore, we have gotten that for all n, 

TV n-l 



/fcS, (-^fc+1 



i=i fc=o ■^'^^ '^^-'^'^ 

Thus, together with condition (1), the first Borel-Cantelli Lemma yields that 

j = l A:=0 ■''^^ ""^^^ 

almost surely for all n large enough. Hence we have 

Hn(^nnA,,„) <2e"^"(3a+2ac,+ac3-6)^ 

which tends to zero as n -^ oo, since (3 + 2c2 + C3) a < 6 and ne^ ^ 00 as n ^ 00. The 
proof of Theorem 1 is complete. 

To prove Theorem 2, we need a replacement of Lemma 2 under weaker conditions. 

2 

Lemma 3. Let C2 > and let {en}^=i be a positive sequence such that H(i?g2 ) > e"""®" '^^ 
for all n. If a sequence {Dn}^^i of subsets in ¥ satisfies e^^^ (2+C2) n(il>^) ^ Q as n ^ 00, 
then Iin{Dn) ^> Q in probability as n ^> 00. 

Proof. From Lemma 1 of Shen et al. (2001) or Lemma 8.1 of Ghosal et al. (2000) it turns 
out that we have, with probability tending to 1, 

fn RJf) u(df) 2 ,0 , r 

Hence for any given (5 > we have that 

Fo-{H^(I)„)>5}<Fo°°{e"^"(2+c,) f R^^f)u{df)>s}+o{l) 
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<1 ne^(2+c.)^ f R^(^f)U{df)+0 




(1) 



which concludes the proof of Lemma 3. 

Proof of Theorem 2. Assume that < a < 1. The proof of Theorem 2 follows from the same 
lines as the proof of Theorem 1. By Lemma 3 it suffices to prove that H^ {Qn fl A^ g„) -^0 
in probability as n ^ oo. For any given 5 > 0, following the proof of Theorem 1 we get 

r ^ ^~^ f (X 1" 1 

<^gn4(2+c,)aV-n(S^.)«£; rr ^"^;^^"+^^ +o(l) 
2 Jfe„.G„.a-)+ne2 f2+c,)a tt^TT fkBAXk+1 



~ 5 1<J<7V ii /o(^fc+l)" 

f(e„,e„,a)+ne^ ((2+C2) a+i(r-2)2(a-l)) + o(l) 



2 7f 

< gJ(e„,e„,a)-n4c2^Q^;^^ ^q ^^^ n ^ OO, 





where the last inequality follows from r > 2 + W " j^"^^ ^ ■ The proof of Theorem 2 is 
complete. 

Proof of Theorem 3. Since n^(A^„g„) < Iin{Qn n A^„e„) + n^(^e„ \ Qn), it suffices that 
the terms on the right hand side both tend to zero in probability. Given 5 > 0, the proof 
of Lemma 3 implies that 

which by condition (1) tends to zero as n — > oo. Assume that [r^] stands for the largest 
integer less than or equal to r^ and assume that Dj — {f & Qn '■ j^n < -^^(/o? /) < '^j^n} 
( Indeed, Dj is an empty set for j > \/2/en since the Hellinger distance cannot exceed \/2 
). Then we have 

CXD 
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oo ^ oo ^ oo 

Take a partition IJi=i ^ji ^^ each Dj such that Dji C {/ : H{fji, f) < ^} for some / 



in L^ and 



-"J 

i=l 

where the last inequahty follows from condition (2). Using the same argument as the 
proof of Theorem 1, one can get H{fkDji, /o) > j^n/'^ and hence we have with probability 
tending to 1 



oo yV, 



a 

i = [-r„] J=l 



,2anel 



j = [r„] i=l ^ ^"^ j = [r„] i=l 



^ V^ „=-2^o L ^. .-2 I 1 ,-2/ ,_iNN _ 2 






^ 3 ^ 1 



3 



which tends to zero as n ^ oo, since ne^ are uniformly bounded away from zero and 
Tn — > C)0, where the second inequality follows from a G (0, 1), the third from the proof of 
Theorem 1, the fifth from the elementary inequality e~^ < - for x > and some of the 
inequalities only hold for all large n. Thus, we have proved that n„(^„n A^-^g^) converges 
to zero in probability. The proof of Theorem 3 is complete. 



2 

- - ' Co C2 



Proof of Theorem 4- From ne'^> cq log n and cq C2 > 1 it turns out that e ^^"-'^^ < 1/n 
and hence, by the first Borel-Cantelli Lemma and Lemma 1, we obtain that 



/ Rnif) n{df) > e--^'- (3+2C2) n(vi/g 



almost surely for all large n. Then, following the proofs of Lemma 3 and Theorem 3, one 
can get that for any 5 > and r > 1, 
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< ^ '^" ^ ^"-^ _L Z \^ pne^((3+2c2)a+cij +^j (a-1)) — , t 



SU{W,J 6 



l-a 



Condition (1) yields that Xl^i '^n < oo. On the other hand, since ci < ^^, we have that 

TS^) (-^ - 1) < -i: 



for aU n > 2 and for all r so large that (3 + 2c2)a + (ci + ^r^) (r^ - 1) < - — , 



6^ < — Yl e"^"^"i+T^)^ = ■ 



4^co((3+2c2)a+(ci + i^)[r]2) ^ ^cp ((3+2c2)«+(ci + ^) (r^-l)) ^ ^_ 

< ^ < -, < 



and hence Xl^i ^n < oo. Thus, by the first Borel-Cantelli Lemma we obtain that 
Iln{Are^) < 5 almost surely for all large n, which concludes the proof of Theorem 4. 
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